# create a word cloud 
bday_words %>% 
  count(word, sort = T) %>% 
  with(wordcloud::wordcloud(word,
                            freq = n,
                            min.freq = 1,
                            max.words = 100,
                            random.order = F,
                            colors = cbPalette))

We are celebrating the birthday of one and only Bethany Lassetter. To satiate Bethany’s thirst for data visualization, we solicited birthday messages from the affiliates of Bethany and visualized what they had to say about the day of her birth. To protect the anonymity of Bethany Affiliates, we present their real names.

1 Methods

1.1 Participants

Participants were recruited through a slack channel and e-mails; see Figure 1.1 for an example recruitment material. The final sample of 10 Bethany Affiliates was mostly good people; see Appendix for the full list of participants.

An example of recruitment emails.

Figure 1.1: An example of recruitment emails.

2 Results

We first present the descriptive statistics of the birthday messages. The number of words in the birthday messages ranged from 17 to 80 words with an average of 54 words (\(SD\) = 22.31). A message with the highest number of words was from Joel and was 80-words long. See Figure 2.1 for the word counts for each Bethany affiliate.

bday_msg %>% 
  tidytext::unnest_tokens(word, msg) %>% 
  group_by(sender) %>% 
  summarize(word_count = n()) %>% 
  ggplot(aes(x = reorder(sender, -word_count),
             y = word_count)) +
  geom_col(aes(fill = ifelse(word_count == max(word_count), "top", "others"))) + 
  geom_text(aes(label = word_count),
            # colour = "#0072B2",
            fontface = "bold",
            vjust = "bottom") +
  scale_fill_manual(values = c("#999999", "#D55E00"),
                    guide = F) +
  scale_y_continuous("Number of Words Used in Messages") + 
  scale_x_discrete("Name of Senders",
                   labels = stringr::str_to_title) +
  theme_classic()
Word counts by senders.

Figure 2.1: Word counts by senders.

Next, we examined the contents of birthday messages. The sentiment analysis using NRC lexicon (Mohammad & Turney, 2013) revealed that most words were mostly positive; See Figure 2.2 for the proportion of words for each sentiment and Table 2.1 for the most common word.

# sentiment analysis
bday_words %>% 
  inner_join(get_sentiments("nrc")) %>% 
  count(sentiment, sort = T) %>% 
  # arrange(desc(n)) %>% 
  mutate(prop = n / sum(n)) %>% 
  ggplot(aes(x = reorder(sentiment, -prop),
             y = prop, 
             label = n)) +
  geom_col(fill = "#999999") +
  geom_text() +
  scale_y_continuous("Proportion of Words",
                     labels = scales::percent) + 
  scale_x_discrete("Sentiment of Words") +
  coord_flip() +
  theme_classic()
Proportion of words for each sentiment.

Figure 2.2: Proportion of words for each sentiment.

Next, we examined the most common word in the birthday messages, which was birthday. Figure 2.3 shows the most common words other than “birthday.”

bday_words %>% 
  filter(word != "birthday") %>% 
  count(word, sort = TRUE) %>%
  top_frac(.05) %>% 
  ggplot(aes(x = reorder(word, -n), 
             y = n)) +
  geom_col(aes(fill = ifelse(n == max(n), "top", "others"))) + 
  scale_fill_manual(values = c("#999999", "#D55E00"),
                    guide = F) +
  scale_y_continuous("Frequency Counts",
                     breaks = seq(1:10)) + 
  scale_x_discrete("Words") +
  coord_flip() +
  theme_classic()
Most common words used in birthday messages.

Figure 2.3: Most common words used in birthday messages.

The most common word for each sentiment is shown below.

bday_words %>% 
  inner_join(get_sentiments("nrc")) %>% 
  add_count(sentiment) %>% 
  group_by(sentiment) %>% 
  slice_max(1) %>% 
  mutate(word = paste0(word, "(", n, ")")) %>% 
  select(sentiment, word) %>% 
  pivot_wider(names_from = sentiment, values_from = word) %>% 
  knitr::kable(caption = "The most common word for each sentiment.")
Table 2.1: The most common word for each sentiment.
anger anticipation disgust fear joy negative positive sadness surprise trust
dying(4) happy(43) dying(4) dying(5) happy(59) wait(9) happy(73) runaway(6) birthday(27) happy(39)

Note. The number of times a word is used is indicated in a bracket.

Are messages with a lot of anger words shorter than those with joy words? We examined whether word use of a particular sentiment is associated with the total number of words in a given message. As shown in Figure 2.4, the association between the word counts of a given sentiment and the message length is most likely to be nil and inconclusive at best.

bday_words %>% 
  inner_join(get_sentiments("nrc")) %>% 
  group_by(sender, sentiment) %>% 
  count() %>% 
  ungroup() %>% 
  group_by(sender) %>% 
  mutate(total_n = sum(n)) %>% 
  ggplot(aes(x = n,
             y = total_n,
             color = sentiment)) + 
  geom_jitter(alpha = .7) +
  geom_smooth(method = lm, se = T) +
  scale_x_continuous("Number of Words") +
  scale_y_continuous("Total Number of Words") + 
  scale_color_manual(values = c(cbPalette, "#000000", "#FFC20A")) +
  theme_classic()
Association between sentiment word counts and message length.

Figure 2.4: Association between sentiment word counts and message length.

We then classified the words into their parts of speech, and the most common adjectives, nouns, and verbs used are plotted below.

bday_words %>% 
  inner_join(parts_of_speech) %>% 
  mutate(pos = ifelse(str_detect(pos, "Verb"), "Verb", pos)) %>% 
  filter(pos %in% c("Adjective", "Noun", "Verb")) %>% 
  split(.$pos) %>% 
  map(., ~{
    count(., word, sort = TRUE) %>% 
      top_frac(.1) %>% 
      ggplot(aes(x = reorder(word, -n), 
                 y = n)) +
      geom_col(aes(fill = ifelse(n == max(n), "top", "others"))) + 
      scale_y_continuous("Frequency Counts") + 
      scale_x_discrete("Words") +
      scale_fill_manual(values = c("#999999", "#D55E00"),
                    guide = F) +
      labs(title = paste(unique(.$pos))) +
      coord_flip() +
  theme_classic()}) 
## $Adjective

## 
## $Noun

## 
## $Verb

3 Discussion

The aim of the present work is two-fold: 1) To entertain Bethany’s obsession with data visualization and 2) to celebrate Bethany’s birthday in the time of pandemic. Our work is the first to demonstrate that birthday wishes can be taken too far, and that our love for Bethany cannot be fitted into a basic paper card.

Picture credit: Appa & Yang

4 Appendix

DT::datatable(bday_msg %>% 
                mutate_at("sender", stringr::str_to_title) %>% 
                arrange(sender), 
              extensions = "Scroller",
                options = list(
                  # scrollY = 400,
                  # scroller = TRUE,
                  # pageLength = 1,
                  autoWidth = TRUE,
                  columnDefs = list(list(width = '10%', targets = 0))),
                rownames = FALSE)